Managed orchestration of virtual machine instance migration

ABSTRACT

A virtual machine running on a source host is determined to be migrated away from the source host. The virtual machine is migrated away from the source host at least by a target host being selected for the virtual machine and a state of the virtual machine being copied from the source host to the target host while the virtual machine continues to run on the source host. The virtual machine is further migrated from the source host by a change to the state of the virtual machine t running on the source host that resulted during the copying being propagated to the target host. The virtual machine is run on the target host such that the virtual machine running on the target host includes the change to the state.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to and is a continuation of U.S. patentapplication Ser. No. 14/750,991, filed Jun. 25, 2015, entitled “MANAGEDORCHESTRATION OF VIRTUAL MACHINE INSTANCE MIGRATION,” which isincorporated by reference for all purposes. This application alsoincorporates by reference for all purposes the full disclosures ofco-pending U.S. patent application Ser. No. 14/750,978, filed Jun. 25,2015, now U.S. Pat. No. 10,228,969, entitled “OPTIMISTIC LOCKING INVIRTUAL MACHINE INSTANCE MIGRATION” and U.S. patent application Ser. No.______, filed Jun. 14, 2019, entitled “MANAGED ORCHESTRATION OF VIRTUALMACHINE INSTANCE MIGRATION” (0097749-489US2).

BACKGROUND

Modern computer systems are frequently implemented as collections ofvirtual computer systems operating collectively on one or more hostcomputer systems. The virtual computer systems may utilize resources ofthe host computer systems such as processors, memory, networkinterfaces, and storage services. When the resources of a particularhost computer system become scarce due to, for example, overutilizationby client virtual computer systems, it may become necessary to move avirtual computer system to a different host computer system to avoidreduced system performance, increased system outages or failures, and adegraded user experience.

One approach to the problem of moving or migrating a virtual computersystem to a different host computer system is to halt the virtualcomputer system, copy the memory and/or the system state of the virtualcomputer system to the different host computer system, and then restartthe virtual computer system. However, in the case of a large orcomplicated virtual computer system, this migration process can take asignificant amount of time, and the ability of a user to interact withthe virtual computer system during that time period may be eliminated orat least severely restricted. Additionally, some system resources, suchas attached storage and network connections may be volatile, introducingthe possibility that the migrated virtual computer system may differsignificantly from the original virtual computer system, furtherintroducing operational issues.

BRIEF DESCRIPTION OF THE DRAWINGS

Various embodiments in accordance with the present disclosure will bedescribed with reference to the drawings, in which:

FIG. 1 illustrates an example environment where a virtual machineinstance is migrated to a new location;

FIG. 2 illustrates an example environment where the migration of avirtual machine instance is managed;

FIG. 3 illustrates an example environment where a workflow associatedwith the migration of a virtual machine instance is presented;

FIG. 4 illustrates an example process for managing the phases of avirtual machine instance migration;

FIG. 5 illustrates an example environment where the first phase of avirtual machine instance migration is presented;

FIG. 6 illustrates an example environment where the second phase of avirtual machine instance migration is presented;

FIG. 7 illustrates an example environment where the third phase of avirtual machine instance migration is presented;

FIG. 8 illustrates an example environment where the fourth phase of avirtual machine instance migration is presented;

FIG. 9 illustrates an example diagram showing the phases of a virtualmachine instance migration;

FIG. 10 illustrates an example state diagram showing the state changesof a virtual machine instance migration;

FIG. 11 illustrates an example environment where requests that maymodify a migrating virtual machine instance are classified and processedto provide optimistic locking;

FIG. 12 illustrates an example environment where resources associatedwith a virtual machine instance migration are managed;

FIG. 13 illustrates an example environment where resources associatedwith a virtual machine instance migration are managed; and

FIG. 14 illustrates an environment in which various embodiments can beimplemented.

DETAILED DESCRIPTION

In the following description, various embodiments will be described. Forpurposes of explanation, specific configurations and details are setforth in order to provide a thorough understanding of the embodiments.However, it will also be apparent to one skilled in the art that theembodiments may be practiced without the specific details. Furthermore,well-known features may be omitted or simplified in order not to obscurethe embodiment being described.

Techniques described and suggested herein include methods, systems, andprocesses for managing the migration of a virtual machine instance froma source host computer system to a target host computer system. Themethods, systems, and processes described herein manage the migration ofa virtual machine instance in phases and improve both the length andimpact of a critical migration phase. As an example of improving thelength and impact of the critical migration phase, the length and impactcan be minimized by performing a majority of the migration beforelocking the virtual machine and thus minimizing the amount of time thatthe virtual machine is unavailable. In some examples, such improvementis attained by optimistically locking the source virtual machine duringthe critical migration phase, classifying application programminginterface requests and other requests that are received by the sourcevirtual machine instance during the critical migration phase, andreducing user or customer impact associated with the migration bycancelling and rescheduling the migration in the event that a requestwhose fulfillment alters the source virtual machine is received duringthe critical migration phase.

In the first phase, after it has been determined that a running virtualmachine instance is a candidate for migration from a first host computersystem (also referred to as the “source” or the “source location”) to asuitable second host computer system (also referred to as the “target”or the “target location”), the second host computer system may beprepared for the migration by the migration manager. This preparationmay include ensuring that the right operating system and/or applicationsare running on the target location and that the target location hassufficient resources available to host the virtual machine instance.

In the second phase, a new instance of the virtual machine may then becreated on the target with the same configuration as the running virtualmachine instance (also referred to as the “original virtual machineinstance”) by the migration manager and memory and state informationfrom the original virtual machine instance may copied to the new virtualmachine instance while the original virtual machine instance continuesto run.

Prior to locking the original virtual machine instance during thecritical migration phase (also referred to as the “flip”), a majority ofthe memory and/or state of the running virtual machine instance may becopied to the new virtual machine instance so that the differencebetween the two virtual machines is minimized. This copying may keep thedifferences between the two virtual machines to a minimum by forwardingany changes to the memory or state of the original virtual machineinstance to the new virtual machine instance. Such changes to the memoryor state of the original virtual machine may occur as a result of, forexample, one or more application programming interface (“API”) requestsreceived by the original virtual machine instance.

In the third phase, the original virtual machine instance may then belocked by the migration manager, so that the final changes to the memoryand/or state of the original virtual machine instance may be propagatedto the new virtual machine instance, ensuring that the two virtualmachine instances are sufficiently the same so as to not disrupt theuser experience. In an embodiment, the final changes to the memoryand/or state of the original virtual machine instance can be propagatedto the new virtual machine instance so that the two virtual machineinstances are identical. This phase, the flip phase, must be kept asshort as possible so that the user experience is not degraded due to aperception that the original virtual machine is locked.

While the flip is in progress, the original virtual machine instance maybe optimistically locked in that any additional API requests received bythe original virtual machine may be classified according to whether theycause changes (“mutations”) to the original virtual machine instance,whether they cause mutations to the source location, or whether they donot cause any mutations. The classification of the API requests may bebased on a categorization of one or more types of API requests. Forexample, API requests of the type that describe resources may have a“describe” categorization and, based on the fact that API requests thatmerely describe resources are non-mutating, all API requests of the“describe” categorization may be assigned a non-mutating classification.Each request may have one or more classifications or categorizations,which may be predetermined and/or may be selected from a set ofclassifications or categorizations.

Those API requests that do not cause (i.e., whose fulfillment does notcause) any mutations may generally be allowed. Those API requests thatcause mutations may cause the migration to be terminated and rescheduledfor a later time. Those API requests that cause mutations may also beblocked (or queued) until the flip is complete, and then they may besent to the new virtual machine instance. Some API requests received bythe original virtual machine instance may be unblockable such as, forexample, those that change the fundamental state of the original virtualmachine instance or those that require a significant amount of time tocomplete. Such unblockable API requests may also cause the migration tobe cancelled and/or rescheduled for a later time.

In the fourth phase, if the flip completes successfully, access to thenew virtual machine instance may be provided to the user, connections toresources associated with the original virtual machine instance may beterminated, and after the original virtual machine instance and the newvirtual machine instance have converged (i.e., after all pending callshave been received and correctly propagated), the original virtualmachine instance may be terminated and resources associated with theoriginal virtual machine instance may be reclaimed (this process is alsoreferred to as “tearing down,” being “torn down,” or as a “tear down”).Conversely, if in the fourth phased the flip does not completesuccessfully due to an error, a cancellation of the migration, or someother such event, access to the original virtual machine instance may bereturned to the user (i.e., it may be unlocked) and the new virtualmachine instance may be torn down.

In an example of how a migration manager may orchestrate the migrationof a virtual machine instance, a user may have access to a virtualmachine instance running on a first host computer system provided by acomputing resource service provider. If it is determined that thevirtual machine instance should be migrated to a second host computersystem, a new virtual machine instance may be instantiated on thatsecond host computer system and the process of copying memory and/orstate from the virtual machine instance on the first host computersystem to the new virtual machine instance on the second host computersystem. During this copy, the virtual machine instance will continue tooperate on the first host computer system and the user may not have anyindication that this phase of the migration process is occurring.

When the copy is complete, and the memory and state of the new virtualmachine instance are sufficiently the same as the memory and state ofthe original virtual machine instance, the original virtual machineinstance may be locked. During the lock, the final memory and state ofthe original virtual machine instance are copied to the new virtualmachine instance. Any changes that occur during the lock may either beallowed or blocked. Those changes that cause sufficient changes to thememory or state of the source virtual machine, and thus that cause anincrease in time that the original virtual machine may be locked, maycause the in-progress migration to be cancelled and rescheduled for alater time. The canceling behavior is intended to minimize the amount oftime that the original virtual machine is locked so that the user mightnot perceive that the virtual machine is not responding.

If the flip completes successfully, the new virtual machine instancewill then be operable and the user may then have access to the newvirtual machine instance that is perceptually identical to the originalvirtual machine instance. If the flip does not complete successfully,either as a result of an error, a cancellation, or some other suchevent, the original virtual machine instance will be unlocked and theuser will continue to have access to the original virtual machineinstance. The cancelled migration may then be rescheduled for a latertime.

FIG. 1 illustrates an example environment 100 where a virtual machineinstance is migrated to a new location in accordance with at least oneembodiment. One or more virtual machine instances may be operating onhost computer systems provided by a computing resource service provider102 as described herein. In the example illustrated in FIG. 1, a firstvirtual machine instance (the original VM instance 114) is running in afirst location (the source location 110). The first location may be oneor more host computer systems configured to provide shared hardware to avirtual computer system service for the instantiation of one or morevirtual machine instances. The original VM instance 114 may be one of aplurality of virtual machine instances associated with the sourcelocation 110. Each of the plurality of virtual machine instancesassociated with the source location 110 may be running, may be paused,may be suspended (e.g., paused and stored to secondary storage), or maybe in some other state. In the example illustrated in FIG. 1, theoriginal VM instance 114 is running (i.e., is performing one or moreoperations).

In the course of the operation of the original VM instance 114, it maybe determined that the original VM instance 114 can be migrated from thesource location 110 to a target location 112. The determination that theoriginal VM instance 114 can be migrated from the source location 110 toa target location 112 may be made as a result of changes in theavailability of resources at the source location 110 (e.g., a shortageof computing power, a shortage of memory, or a lack of networkbandwidth). The determination that the original VM instance 114 can bemigrated from the source location 110 to a target location 112 may alsobe made to move the original VM instance 114 logically closer to one ormore computing resource service provider resources. The determinationthat the original VM instance 114 can migrated from the source location110 to a target location 112 may include determining one or morecandidate locations from a set of available candidate locations based onresource availability, location, cost, or other selection criteria.

The determination that the original VM instance 114 can be migrated fromthe source location 110 to a target location 112 may also be made by acustomer request to, for example, reduce one or more costs associatedwith the original VM instance 114. The determination that the originalVM instance 114 can be migrated from the source location 110 to a targetlocation 112 may also be made by a service, process, or module operatingin association with the computing resource service provider that may beconfigured to determine more optimal locations form virtual machineinstances. In the example illustrated in FIG. 1, the target location 112is shown within the computing resource service provider 102. In anembodiment, either the source location 110, the target location 112, orboth can be outside of the computing resource service provider 102(e.g., they may be provided by customer and/or other third partyenvironments).

The request to migrate the original VM instance 114 from the sourcelocation 110 to the target location 112 may be received by a migrationmanager 104 operating with the computing resource service provider 102.In an embodiment, the migration manager 104 is implemented as a servicethat may be one of a plurality of services provided by the computingresource service provider 102. The migration manager 104 may also bereferred to herein as a migration manager computer system and, in someembodiments, can be implemented as a distributed computer system asdescribed herein.

When migrating the original VM instance 114 from the source location 110to the target location, a number of systems, services, processes, andresources may be communicating with the original VM instance 114. Thesesystems, services, processes, and resources cannot generally beguaranteed to change their behavior simultaneously so that theircommunications switch from the original VM instance 114 at the sourcelocation 110 to a new VM instance 116 at the target location 112. Themigration manager 104 may be configured to communicate with each of theplurality of systems, services, processes, and resources in order tomanage the migration.

The migration manager 104 may be configured to manage (or orchestrate)the migration by selecting one or more operations to perform based atleast in part on the state of the migration and/or the classification ofone or more requests (e.g., application programming interface requests)and then by performing those selected operations. For example, themigration manager may select and perform one or more operations todetermine the proper order for migration, manage a workflow formigration, issue commands to the systems, services, processes, andresources associated with the migration, determine whether the migrationis successful, start and stop virtual machine instances, determinewhether the migration has failed, determine whether the migration shouldbe cancelled, and manage a migration rollback if errors occur.

During a migration, each of the plurality of systems, services,processes, and resources associated with the migration may only be madeaware of their portion of the migration. The migration manager 104 maymanage the migration in phases as described herein and may manage themigration of each of the plurality of systems, services, processes, andresources associated with the migration by issuing API requests, makinglibrary calls, using interfaces (e.g., a web interface), or by someother means. The phase of a migration (also referred to herein as the“current state of the migration”) may determine whether requests such asapplication programming interface requests may be allowed or blocked,and may also be used to determine whether a migration should becancelled. The migration manager 104 may also manage timeouts for eachof the phases and/or for each migration action associated with each ofthe plurality of systems, services, processes, and resources associatedwith the migration which may also be used to determine whether amigration should be cancelled. For example, a block storage service may,during a migration, receive an API request from the migration manager104 to provide access to a block storage device to the new VM instance116. As part of this access, the block storage service may need tosynchronize input-output (“I/O”) requests between the original VMinstance 114 and the new VM instance 116. The migration manager 104 mayestablish a timeout value for this synchronization so that, for example,if the block storage service does not respond to the API request in areasonable amount of time, the migration may be cancelled.

When the request to migrate the original VM instance 114 from the sourcelocation 110 to the target location 112 is be received by a migrationmanager 104 operating with the computing resource service provider 102,one or more commands 106 may be generated by the migration manager 104in response to that request. The one or more commands 106 may then besent to a system manager 108 operating with the computing resourceservice provider 102. In an embodiment, the system manager 108 isimplemented as a service that may be one of a plurality of servicesprovided by the computing resource service provider 102. The systemmanager 108 may be configured to manage resources of a computingresource service provider 102 where such resources may be provided bycomputer systems in a distributed and/or virtual computing environment.

The one or more commands 106 that may be sent from the migration manager104 to the system manager 108 in response to the request to migrate mayinclude commands to configure the target location to instantiate a newvirtual machine instance, commands to instantiate a new virtual machineinstance at the target location 112, commands to copy the memory and/orstate from the original VM instance 114 to a new VM instance 116,commands to deactivate the original VM instance 114, commands toactivate the new VM instance 116, commands to lock either the originalVM instance 114 or the new VM instance 116, commands to pause either theoriginal VM instance 114 or the new VM instance 116, commands to unpauseeither the original VM instance 114 or the new VM instance 116, commandsto forward memory and/or state information from the original VM instance114 to the new VM instance 116, commands to tear down the original VMinstance 114, commands to terminate a migration between the sourcelocation 110 and the target location 112, and other such commandsassociated with the migration 118 of the original VM instance 114 fromthe source location 110 to the target location 112.

The original VM instance 114 may have access 122 to one or moreresources and services 120 provided by the computing resource serviceprovider 102. For example, the computing resource service provider mayprovide access 122 to resources and services 120 such as networkinterfaces, storage services, authentication services, authorizationservices, and/or other resources and services. As part of the migration118 of original VM instance 114 from the source location 110 to a targetlocation 112, the migration manager 104 and/or the system manager 108may instantiate a new VM instance 116 at the target location 112 and mayprovide access 124 to the same resources and services 120 as may beprovided to the original VM instance 114.

FIG. 2 illustrates an example environment 200 where the migration of avirtual machine instance is managed as described in FIG. 1 and inaccordance with at least one embodiment. A user 202 may connect 206 toone or more services 212 through a computer system client device 204.The services 212 may be provided by a computing resource serviceprovider 210. In some embodiments, the computing resource serviceprovider 210 may provide a distributed, virtualized, and/or datacenterenvironment within which one or more applications, processes, services,virtual machines, and/or other such computer system entities may beexecuted. In some embodiments, the user 202 may be a person, or may be aprocess running on one or more remote computer systems, or may be someother computer system entity, user, or process.

The command or commands to connect to the computer system instance mayoriginate from an outside computer system and/or server, or mayoriginate from an entity, user or process on a remote network location,or may originate from an entity, user or process within the computingresource service provider, or may originate from a user of the computersystem client device 204, or may originate as a result of an automaticprocess, or may originate as a result of a combination of these and/orother such origin entities. In some embodiments, the command or commandsto initiate the connection 206 to the computing resource serviceprovider 210 may be sent to the services 212, without the interventionof the user 202. The command or commands to initiate the connection 206to the services 212 may originate from the same origin as the command orcommands to connect to the computing resource service provider 210, ormay originate from another computer system and/or server, or mayoriginate from a different entity, user, or process on the same or adifferent remote network location, or may originate from a differententity, user, or process within the computing resource service provider,or may originate from a different user of a computer system clientdevice 204, or may originate as a result of a combination of theseand/or other such same and/or different entities.

The user 202 may request connection to the computing resource serviceprovider 210 via one or more connections 206 and, in some embodiments,via one or more networks 208 and/or entities associated therewith, suchas servers connected to the network, either directly or indirectly. Thecomputer system client device 204 that may request access to theservices 212 may include any device that is capable of connecting with acomputer system via a network, including at least servers, laptops,mobile devices such as smartphones or tablets, other smart devices suchas smart watches, smart televisions, set-top boxes, video game consolesand other such network-enabled smart devices, distributed computersystems and components thereof, abstracted components such as guestcomputer systems or virtual machines, and/or other types of computingdevices and/or components. The network may include, for example, a localnetwork, an internal network, a public network such as the Internet, orother networks such as those listed or described below. The network mayalso operate in accordance with various protocols such as those listedor described below.

The computing resource service provider 210 may provide access to one ormore host machines, as well as provide access one or more virtualmachine (VM) instances as may be operating thereon. The services 212provided by the computing resource service provider 210 may also beimplemented as and/or may utilize one or more VM instances as may beoperating on the host machines. For example, the computing resourceservice provider 210 may provide a variety of services to the user 202and the user 202 may communicate with the computing resource serviceprovider 210 via an interface such as a web services interface or anyother type of interface. While the example environment illustrated inFIG. 2 shows a single connection or interface for the services 212 ofthe computing resource service provider 210, each of the services mayhave its own interface and, generally, subsets of the services may havecorresponding interfaces in addition to or as an alternative to thesingle interface.

The computing resource service provider 210 may provide various servicessuch as the services 212 to its users or customers. The servicesprovided by the computing resource service provider 210 may include, butmay not be limited to, virtual computer system services, block-leveldata storage services, cryptography services, on-demand data storageservices, notification services, authentication services, policymanagement services, or other services. Not all embodiments describedmay include all of these services, and additional services may beprovided in addition to or as an alternative to the services explicitlydescribed. As described above, each of the services 212 may include oneor more web service interfaces that enable the user 202 to submitappropriately configured API requests to the various services throughweb service requests. In addition, each of the services 212 may includeone or more service interfaces that enable the services to access eachother (e.g., to enable a virtual machine instance provided by thevirtual computer system service to store data in or retrieve data froman on-demand data storage service and/or to access one or moreblock-level data storage devices provided by a block-level data storageservice).

In an example, a virtual computer system service may be a collection ofcomputing resources configured to instantiate virtual machine instanceson behalf of a customer such as the user 202. The customer may interactwith the virtual computer system service (via appropriately configuredand authenticated API requests) to provision and operate virtual machineinstances that are instantiated on physical computing devices hosted andoperated by the computing resource service provider 210. The virtualcomputer system service may also be configured to initiate the migrationof virtual machine instances as described herein. The virtual machineinstances may be used for various purposes, such as to operate asservers supporting a website, to operate business applications or,generally, to serve as computing power for the customer. Otherapplications for the virtual machine instances may be to supportdatabase applications, electronic commerce applications, businessapplications, and/or other applications.

In another example, a block-level data storage service may comprise oneor more computing resources that collectively operate to store data fora customer using block-level storage devices (and/or virtualizationsthereof). The block-level storage devices of the block-level datastorage service may, for example, be operationally attached to virtualmachine instances provided by the virtual computer system servicedescribed herein to serve as logical units (e.g., virtual drives) forthe computer systems. A block-level storage device may enable thepersistent storage of data used/generated by a corresponding virtualmachine instance where the virtual computer system service may onlyprovide ephemeral data storage for the virtual machine instance.

In the example illustrated in FIG. 2, the one or more services 212 maybe implemented as, or may be supported by one or more virtual machineinstances as described above. For example, the one or more services 212may include an original VM instance 216 visible to the user 202 (i.e.,configured such that the user 202 may use and/or otherwise interact withthe original VM instance 216). The original VM instance 216 may berunning at first, or source location 214, as described above. Uponreceiving a command to migrate the original VM instance 216 from thesource location 214 to a target location 222, a migration manager 218may direct the system manager 220 to begin the migration from the sourcelocation 214 to the target location 222 as described above. Themigration may be accomplished by instantiating a new VM instance 224 atthe target location 222 and copying memory and/or state from theoriginal VM instance 216 to the new VM instance 224. The migration mayalso be accomplished by forwarding 226 memory and/or state changes fromthe original VM instance 216 to the new VM instance 224. For example, ifduring the migration, the user 202 alters a memory location on theoriginal VM instance 216 (e.g., as a result of executing an application)after that memory has copied from the original VM instance 216 to thenew VM instance 224, the new memory value may be forwarded to the new VMinstance 224. This forwarding 226 of memory and/or state changes mayserve to keep the new VM instance 224 synchronized with the original VMinstance 216 during migration.

As described herein, the last phase of the migration prior to cleanup isthe flip 228. During the flip 228, the original VM instance 216 may havesome or all changes locked out so that the user 202 and/or otherprocesses associated with the original VM instance 216 may not alter ormutate the original VM instance 216. During the flip 228, any remainingdifferences between the original VM instance 216 and the new VM instance224 may then be copied from the original VM instance 216 to the new VMinstance 224. If the flip 228 is successful, the connection 230 from theservices 212 to the original VM instance 216 may be replaced by aconnection 232 from the services 212 to the new VM instance 224 so that,from the user's perspective, the backing VM instance appears to be thesame as before the migration (because, for example, the new VM instance224 may be substantially the same as the original VM instance 216). Ifthe flip is not successful, the connection 230 from the services 212 tothe original VM instance 216 may be retained so that, from the user'sperspective, the backing VM instance is appears to be the same as beforethe attempted migration (because it has not changed). Thus, regardlessof whether the migration is successful or not (e.g., because of failureor cancellation), the user may still perceive the same system state andmay consider the original VM instance 216 and the new VM instance 224 asthe same.

In an embodiment, after the flip 228, if the flip is successful, theoriginal VM instance 216 is no longer accessible to the user 202 and/orto the services 212. After the flip 228, if the flip is not successful,the new VM instance 224 is not accessible to the user 202 and/or to theservices 212. This is to ensure that, after the flip, only one of thetwo virtual machine instances is available to the user 202 and/or to theservices 212. As part of the flip 228, the migration manager 218 and/orone or more agents or services under the direction of the migrationmanager 218 will enable at most one of the virtual machine instances by,for example, unpausing at most one paused virtual machine instance,unlocking at most one locked virtual machine instance, enabling at mostone disabled virtual machine instance, or a combination of these orother operations to cause at most virtual machine instance to runningafter the flip 228.

In an embodiment, when errors occur during the flip 228, the migrationmanager 218, the system manager 220, or some other computer systementity (e.g., a hypervisor or an agent running on the source locationand/or on the target location) performs one or more operations inresponse to the error. Examples of errors that may occur include, butare not limited to, the failure to prepare the target location 222 toinstantiate the new VM instance 224, the failure to attach one or moreresources to the new VM instance 224, the failure to detach one or moreresources from the original VM instance 216, or some other failure(e.g., a power outage during migration). Such errors may be ignored ifthey are of a type classified as not being harmful to the migration orif ignoring the error allows the error to be processed by some otherprocess, module, application, or service. For example, an error inmigrating a device may be ignored if ignoring such an error results inthe device being impaired after the migration and such impairment isdetected by a process, module, application, or service associated withthe device. Such errors may also cause the migration manager 218 and/orone or more other services to cancel the migration and attempt to undothe migration by undoing the operations that occurred prior to theattempted flip.

As an example of operations that could be performed to undo themigration, the migration manager 218 and/or one or more other servicesmay invalidate, disable, and/or deactivate one or more credentials toaccess resources that may have been granted to the new VM instance 224at the target location 222. The migration manager 218 and/or one or moreother services may also re-validate, enable, and/or reactivate one ormore credentials to access resources that may have been suspended forthe original VM instance 216 at the source location 214 In anembodiment, the migration manager 218 and/or one or more other servicesrestores the state of the system to the point before the migration byperforming a new attachment to the resources, thus generating a new setof credentials to access the resources.

In another embodiment, the migration manager 218 provides a workflow toperform the flip 228, directing the original VM instance 216 and/or tothe source location 214 to perform one or more operations to cause theflip to occur. In this embodiment, the migration manager 218 alsoprovides a workflow to perform the flip 228, directing the original VMinstance 216 and/or to the source location 214 to perform one or moreoperations to cause the flip to occur. In such an embodiment, themigration manager 218 also provides one or more workflow operations tothe original VM instance or the new VM instance to undo the flip in theevent of an error.

One or more actions may be performed in association with the workflow tohandle errors and/or to undo the flip, depending on the cause andseverity of the error. For example, the errors may be handled byresuming the original VM instance 216 at the source location 214 or byresuming the new VM instance 224 at the target location 222. In theevent that the migration manager 218 cannot easily determine which VMinstance to resume (e.g., in the event of a loss of a connection betweenthe VM instances where the migration manager 218 cannot determine thestate of the VM instances), the migration manager 218 may send commandsto both of the VM instances, putting them both in a waiting state beforedetermining which VM instance to resume and which to terminate. In theevent of a catastrophic failure such as, for example, a power outageduring the migration, the migration manager 218 may also have to waituntil after power restoration to determine the state of the VM instancesand/or to determine which may be resumed or restarted. As describedabove, the migration manager 218 performs operations that cause at mostone VM instance to running at the end of the flip. In the event of acatastrophic failure, the migration manager may not be able to determinewhich VM instance to resume and may instead issue an alarm or an alertto inform an entity associated with the computing resource serviceprovider of the indeterminable state.

In an embodiment, the migration manager 218 can determine whether theflip is successful by comparing a state of the original VM instance 216to a state of the new VM instance 224. The state of the original VMinstance 216 can be determined after the original VM instance 216 islocked and can be updated due to changes that may occur as the originalVM instance 216 converges. The state of the new VM instance 224 can bedetermined after the flip has completed and after all changes have beenforwarded from the original VM instance 216 to the new VM instance 224(e.g., also after the original VM instance 216 converges). If adifference between the state of the original VM instance 216 and thestate of the new VM instance 224 is below a minimum success threshold(i.e., the differences are minor, insignificant, or immaterial), thenthe flip is successful. Conversely if the difference between the stateof the original VM instance 216 and the state of the new VM instance 224is above the minimum success threshold (i.e., the differences are major,significant, or material), then the flip is a failure. Note that whenthe migration is cancelled or when requests are blocked, the differencesmay be above the minimum success threshold and the flip may be afailure.

FIG. 3 illustrates an example environment 300 where a workflowassociated with the migration of a virtual machine instance is presentedas described in FIG. 1 and in accordance with at least one embodiment. Arequest to migrate 302 a virtual machine may be received by a migrationmanager 304 as described above. In an embodiment, the migration managerdetermines whether the migration is likely to succeed 306 based on anindicator of success of the migration (also referred to herein asdetermining a “likelihood of success of the migration” or more simply asdetermining a “likelihood of success”). For example, the migrationmanager 304 may determine an indicator of success of the migration bycalculating a probability (e.g., between zero and one) determined fromon a probability model based on past migrations. The migration managermay also determine an indicator of success of the migration by examininga system state 308 (as described herein) and determining whether a setof conditions has been satisfied and/or is likely to be satisfied. Asmay be contemplated, the methods of determining an indicator of successof a migration described herein are merely illustrative examples andother methods of determining an indicator of success of a migration maybe considered as within the scope of the present disclosure.

The determination of the indicator of success of the migration, orwhether the migration is likely to succeed 306, may include evaluatingthe system state 308 of one or more services or resources 310. Forexample, if the system state 308 indicates that a virtual machine iscurrently experiencing a very high volume of network or storageactivity, that virtual machine may not be a good candidate formigration. The determination of whether the migration is likely tosucceed 306 may also include evaluating a migration history 322 (alsoreferred to herein as “migration history data”) that includes results(e.g., the type of migration and whether it was successful or not) ofone or more previous virtual machine migrations (also referred to hereinas “previous migrations”). The migration history data may also includeone or more prior system states from one or more previous migrations.For example, if the migration history 322 indicates that a certain typeof virtual machine instance is rarely successfully migrated because, forexample, one of the steps times out, then that virtual machine may alsonot be a good candidate for migration.

As a result of determining that the migration is a good candidate formigration and, for example, that the migration is likely to succeed 306,the migration manager 304 may then begin the migration 312. Themigration may be based on a migration workflow 314 that may split themigration into phases as described herein. The migration workflow 314may specify an order of one or more migration operations configured to,for example, prepare the target, commission the target location, flipthe virtual machine, complete the migration cleanup, and/or other suchmigration operations. In the example illustrated in FIG. 3, themigration workflow 314 is split into four phases: a prepare phase, acommission phase, a flip phase, and a cleanup phase. These four phasesare described in more detail below. A migration workflow is configuredso that the migration manager 304 may determine the correct API requestsand/or the order of those API requests so that the migration commands316 sent to the services and resource 310 are performed in the correctorder. If it is not determined that the migration is likely to succeed306, the migration manager may send a message indicating as such to arequestor and/or may queue the migration so that if may be attempted ata later time.

Based on the migration workflow 314, the migration manager 304 may begingenerating migration commands 316 to be sent to the services andresources 310 associated with the migration. In an embodiment, theservices and resources are provided by a computing resource serviceprovider, such as the computing resource service provider 102 describedin connection with FIG. 1. In another embodiment, some or all of theservices and resources are provided by a customer or a third partyassociated with the computing resource service provider.

During the migration phases, the system state 308 may be continuallymonitored by the migration manager 304 so that, for example, spikes inresource demand may be determined. Additionally, migration data 318 maybe collected 320 such as, for example, the length of time that migrationsub-steps take to complete (e.g., how long it takes to perform amigration operation associated with a particular service or resource),whether such sub-steps succeed or fail, or possible reasons for successor failure. The migration data 318 may be collected 320 and stored inthe migration history 322 to inform subsequent migrations. Based on thesystem state 308 and/or based on the migration data 318, the migrationmanager 304 may determine whether or not to cancel 324 the migrationbefore it completes. The migration manager 304 may also determinewhether or not to cancel 324 the migration in the event of a timeout asdescribed herein.

If it is determined to cancel 324 the migration, the migration manager304 may perform the cancellation based on a cancel and rollback workflow326 (also referred to herein as a “cancellation workflow”) that mayspecify the order for a set of cancellation operations and may alsospecify the order for a set of rollback operations. In an embodiment,the cancel and rollback workflow 326 is part of the migration workflow314 (i.e., the cancellation and rollback workflow is a subset of the setof operations that specify the migration workflow). The set ofcancellation operations and the set of rollback operations, collectivereferred to herein as a set of cancel and rollback commands 328 may besent to the services and resources 310 as a result of the cancel androllback workflow being performed by the migration manager 304. Thedecision to cancel 324 the migration may also be stored in the migrationhistory 322.

The system state 308 and/or the migration data 318 may be used todetermine whether a migration should occur as described herein, may beused to determine the best time to perform a migration, and/or may alsobe used to determine whether the migration is proceeding correctly. Inan embodiment, the system state 308 and/or the migration data 318 canalso be used by the migration manager 304 to improve workflows, adjusttimeouts, improve memory convergence, or to determine other parametersassociated with a migration. In such an embodiment, the migrationmanager 304 can include a machine learning system configured to receivethe system state 308 and/or the migration data 318 and evaluate itagainst the migration history 322 to improve workflows, adjust timeouts,improve memory convergence, or to determine other parameters associatedwith a migration. The machine learning system may also be configured toimprove determinations about when and how to cancel a migration and/orto improve determinations about which migrations are especially good (orespecially bad) candidates. Additionally, although not illustrated inFIG. 3, the system state 308 may also be used in conjunction with themigration workflow 314 to, for example, alter the workflow, makeworkflow decisions (e.g., to perform certain actions in response tochanges in the system state 308), or to execute workflow steps such as,for example, to perform cleanup, cancel, or rollback operationsassociated with the migration.

FIG. 4 illustrates an example process 400 for managing the phases of avirtual machine instance migration as described in FIG. 1 and inaccordance with at least one embodiment. A migration manager, such asthe migration manager 104 described in connection with FIG. 1, mayperform at least a part of the process illustrated in FIG. 4. A systemmanager, such as the system manager 108 described in connection withFIG. 1, may also perform at least a part of the process illustrated inFIG. 4.

A migration manager may first receive a request to perform a migration402 of a virtual machine instance. The migration manager may then locatea target 404 to which the virtual machine instance may be migrated. Themigration manager may locate the target based on resource availability,proximity to a customer, proximity to system resources, resource cost,or other such considerations. In an embodiment, a representation of thedesired capabilities can be generated as, for example, a hashrepresentation of the parameters of the desired capabilities. Theseparameters may include the size of the virtual machine instance, thetype of processor or processors needed, the amount of memory, anoperating system version, and/or software versions desired. The desiredcapabilities may be communicated to a virtual machine service using oneor more API requests, or may be communicated to a virtual machineservice as a set (i.e., in bulk or batches).

The migration manager may also direct the system manager to locate atarget 404 to which the virtual machine instance may be migrated. Themigration manager and/or the system manager may also direct a thirdsystem or service to locate a target 404 to which the virtual machineinstance may be migrated. For example, the migration manager maygenerate a request for a target based on the desired capabilities of thetarget (e.g., type of CPU, type of hypervisor, installed software,associated hardware, etc.) and may send this request to the systemmanager. The system manager may then forward this request to a virtualmachine service that may be configured to provide a set of one or morecandidate targets in response to that request based on the desiredcapabilities. The system manager may then choose a subset of the set ofone or more candidate targets and may provide that subset to themigration manager. As may be contemplated, the methods and systems forlocating a target to which the virtual machine instance may be migratedthat are described herein are merely illustrative examples, and othermethods and systems for locating a target to which the virtual machineinstance may be migrated may be considered as within the scope of thisdisclosure.

If it is not the case that a target is located 406, then the migrationmanager may generate an error 408 and send it to the requester of themigration. In addition to generating an error 408, the migration managermay also queue the request for migration for later processing. If is thecase that a target is located 406, the migration manager may begin toprepare the target 410. The migration manager may begin to prepare thetarget 410 by, for example, generating one or more API requests to thetarget to reserve and/or create a location for the virtual machineinstance (the location may also be referred to herein as a “slot”) toreserve hardware and/or other resources associated with the virtualmachine instance, and/or to instantiate a base virtual machine instancethat may be used to migrate the virtual machine instance.

While it is not shown in the process illustrated in FIG. 4, themigration manager may determine to cancel the migration at severalpoints during the process. For example, while the migration continues toprepare the target 410, the migration manager may determine that themigration is not likely to succeed as described above. At thisdetermination, the migration manager may cancel the migration andperform any rollback necessary to return the system to a known state.Similarly, the migration manager may determine to cancel the migrationif part of the process of preparing the target 410 takes too long, or ifmaintaining the synchronization between the virtual machine instance atthe source and the virtual machine instance at the target becomes toocostly. The migration manager may also cancel the migration at othersteps of the process illustrated in FIG. 4 such as, for example, beforethe lock of the source, during the lock of the source, during thecommission of the target location, during the flip from the source tothe target, or after the flip from the source to the target hascompleted.

If it is not the case that the target is prepared 412, the migrationmanager may begin a rollback 424 and, after the rollback may resume thevirtual machine instance at the source 426. In addition to performingthe rollback and restore operations, the migration manager may alsoqueue the request for migration for later processing. If is the casethat the target is prepared 412, the migration manager may then beginmonitoring and synchronizing the source and target 414 as describedherein.

The migration manager may then commission the VM instance in the targetlocation 416 (also referred to herein simply as “commission the targetlocation”). The migration manager may commission the VM instance in thetarget location by performing a process or workflow comprising a set ofoperations that prepare the target location to load an image of thevirtual machine instance and to execute the virtual machine image. Themigration manager may also perform additional operations associated withthe commission of the target location 416 (e.g., in addition to thosedescribed herein) including, but not limited to, provisioning the VMinstance, attaching resources to the VM instance, verifying the VMinstance, or executing one or more additional processes using the VMinstance after the VM instance is executing.

The migration manager may commission the target location 416 by, forexample, verifying the target, creating interfaces for the virtualmachine instance at the target, attaching storage and network resourcesto virtual machine instance at the target, associating credentials withthe virtual machine instance at the target, launching the virtualmachine instance at the target, and beginning the process of copyingmemory and state from the virtual instance at the source to the virtualmachine instance at the target. This copying of memory and state fromthe virtual machine instance at the source to the virtual machineinstance at the target may be performed while the virtual machineinstance at the source is still running. This may require the migrationmanager to also track changes made to the virtual machine instance atthe source and to propagate those changes to the virtual machineinstance at the target during and/or after the copy.

In an embodiment, the migration manager will commission the targetlocation 416 by providing packet forwarding from the source to thetarget. This packet forwarding will allow the virtual machine instanceat the source to continue receiving data packets from services and/orresources and to forward those data packets to the virtual machineinstance at the target. This packet forwarding may also allow bothvirtual machine instances to send and receive data on behalf of theother, thereby retaining connections with the external services and/orresources associated with the virtual machine instances during themigration. For example, an I/O request from the virtual machine instanceat the source to a block storage service may receive a response to thatrequest during migration. The response may be received at the virtualmachine instance at the source and then forwarded to the virtual machineinstance at the target. Further actions based on that response may beperformed by the source or may be performed by the target purporting tothe source (i.e., so that an error is not generated). Such packetforwarding may continue throughout the migration.

The migration manager may determine that the process to commission thetarget location 416 has completed after one or more conditions are met.For example, when all API requests associated with the process tocommission the target location 416 have been issued, all responses havebeen received from the services and/or resources, and no further data isexpected. In an embodiment, the migration manager will wait for one ormore systems to reach a known state (also referred to herein as“converging”) before determining that the process to commission thetarget location 416 has completed. The migration manager may alsodetermine that the process to commission the target location 416 hascompleted if there is an error, or if there is a timeout, or if itbecomes apparent that the migration will not succeed.

Upon completion of the process to commission the target location 416, ifthe process has not completed successfully 418, the migration managermay begin a rollback 424 and, after the rollback may resume the virtualmachine instance at the source 426. In addition to generating an error408, the migration manager may also queue the request for migration forlater processing as described above. Conversely, upon completion of theprocess to commission the target location 416, if the process hascompleted successfully 418, the migration manager may proceed to thelock of the source and target 420 by, for example, locking a virtualmachine abstraction associated with the migration.

When the migration manager locks the virtual machine instance 414 at thesource and the virtual machine instance at the target by locking avirtual machine abstraction, this lock to the virtual machine instancesmay prevent any entity from performing any actions on the virtualmachine instances that may substantially alter the virtual machineinstance (also referred to herein as “mutating” the virtual machineinstance). Examples of operations that may be prevented by the lock areadding storage volumes to the virtual machine instance, changing thenetwork interface of the virtual machine instance, stopping the virtualmachine instance, or other such actions. The lock may prevent all suchactions or may prevent some and allow others. The lock may also generatewarnings and/or errors to the user so that the user may determinewhether to override (or ignore) the lock.

After the lock, the migration manager may then proceed to the flip 422.Although not illustrated in FIG. 4, the migration manager may performone or more operations prior to the flip 422 to begin cleanup after themigration. For example, in the event that the migration will notcomplete successfully (e.g., failing either at the prepare phase or atthe commission phase), the migration manager may have completed a numberof operations associated with the migration. To facilitate cleanup, themigration manager may store a stack of operations performed, so that thestack of operations may be used in the subsequent cleanup. Similarly,the migration manager may perform steps during the migration to cleanupcertain operations if, for example, the changes associated with thoseoperations are no longer required for the migration. Such operationsthat may be cleaned early may include temporary storage of files,temporary access to resource, or other such operations. It should benoted that the stack of operations that occur during the migration growsas the migration progresses so that, at the flip 422, the amount andcomplexity of the operations that may need to be rolled back in therollback 424 may be the largest and/or the most complex.

In an embodiment, the migration manager will lock the source and targetvirtual machine instances at an earlier time such as, for example,before the commission of the target location 416. In another embodiment,the migration manager will delay the lock of the source and targetvirtual machine instances as late as possible in the migration process,and wait until after the commission of the target location 416, or delayuntil after the flip 422 has begun. This delayed locking (also referredto herein as “optimistic” locking) minimizes the time that a user may beunable to interact with a virtual machine instance that has beenselected for migration by keeping the virtual machine unlocked duringthe commission phase.

Optimistic locking, described in detail below, may be accomplished bycategorizing changes that may be received at the running virtual machineinstance into whether or not they introduce changes, whether thosechanges are changes to the user visible abstraction of the virtualmachine instance or to the domain (i.e., the actual virtual machineinstance as instantiated), and whether those changes can be blocked bythe migration manager. Each time changes are received that change thevirtual machine instance, a version number for the virtual machineinstance is incremented. Each time changes are received that change thedomain, a version number for the domain may be incremented. If, duringthe migration, the version numbers diverge from where they were at thebeginning of the migration, the migration manager may either attempt tosynchronize the changes, block the changes to the source so that theymay be applied to the target after migration, or cancel the migration.Version numbers are described in more detail below. In an embodiment,the migration manager will optimize for cancelling the migration, thusminimizing disruption of the customer experience.

As described above, if it is determined that, upon completion of theprocess to commission the target location 416, if the process hascompleted successfully 418, and the source and target are locked, themigration manager may proceed to the flip 422. The migration manager mayperform one or more operations prior to the flip 422 such as, forexample, verifying that a substantial portion of the memory and/or statehas been copied from the virtual machine instance at the source to thevirtual machine instance at the target, verifying all interfaces andresources are correctly attached to the virtual machine instances,verifying that the remaining memory and/or state changes aresufficiently minor as to be quickly propagated to the virtual machineinstance at the target, and readying any resources for the finaltransition from the virtual machine instance at the source to thevirtual machine instance at the target.

After the flip 422, the migration manager may then determine whether thevirtual machine instance was successfully flipped 428 from the source tothe target. The virtual machine instance was successfully flipped 428from the source to the target if the memory and/or state (collectivelyreferred to herein as the “instance state”) of the virtual machineinstance at the target is sufficiently the same as the instance state ofvirtual machine instance at the source, such that difference between theinstance state of the virtual machine instance at the target and theinstance state of the virtual machine instance at the target is lessthan a threshold value. The difference between the instance state of thevirtual machine instance at the target and the instance state of thevirtual machine instance at the target may be determined by, forexample, computing a hash value of one or more parameters specifiedwithin the respective instance states and comparing those hash values.

If it is not the case that the virtual machine instance was successfullyflipped 428 from the source to the target, the migration manager mayperform one or more operations to rollback 424 the migration asdescribed herein, and may resume the virtual machine instance at thesource 426 so that the virtual machine instance at the source maycontinue to operate. If it is not the case that the virtual machineinstance was successfully flipped 428 from the source to the target, themigration manager may also generate an error such as the error 408 asdescribed above and send it to the requester of the migration. Inaddition to generating an error, the migration manager may also queuethe request for migration for later processing.

If it is the case that the virtual machine instance was successfullyflipped 428 from the source to the target, the migration manager maystart the virtual machine instance at the target 430 and may completethe teardown of the source 432 as described herein, so that the virtualmachine instance at the target may operate in place of the virtualmachine instance at the source, thus completing the successfulmigration. In an embodiment, the migration manager will unlock thevirtual machine instance at the source prior to the teardown of thesource 432 to allow any blocked or pending mutating changes to proceed.These blocked or pending mutating changes may also be propagated to thevirtual machine instance at the target via the packet forwarding. Theteardown of the source 432 may remove duplicate network mapping, mayremove redundant block storage connections, and may terminateconnections with other services and/or resources. The migration managermay ensure that all connections have converged (i.e., reached a knowngood state) prior to the teardown of the source 432.

FIG. 5 illustrates an example environment 500 where the first phase of avirtual machine instance migration is presented as described in FIG. 1and in accordance with at least one embodiment. The first phaseillustrated in FIG. 5 is the prepare phase, where managers 502 such asthe migration manager 104 and the system manager 108 described inconnection with FIG. 1 prepare the target location to receive themigrated virtual machine instance. In the prepare phase, the original VMinstance 506 is running at the source location 504 with access to one ormore services and resources 508 as described herein. Connections betweenthe original VM instance 506 and the services and resources 508 mayinclude connections to block storage devices provided by a block storageservice, connections to a network via a network interface, connectionsto a redundant storage service, or other such connections. Theconnections may be assigned to the virtual machine instance during thelife of the virtual machine instance or may be temporarily provided tothe virtual machine instance (e.g., may be “leased”) and managed by aservice such as a block storage service. During the prepare phase, themanagers 502 may locate a target location 510 based on desiredcapabilities and also based on these connections to the services andresources and may create a new VM slot 512 at the target location 510.The target location 510 may be selected based on the desiredcapabilities as described above.

FIG. 6 illustrates an example environment 600 where the second phase ofa virtual machine instance migration is presented as described in FIG. 1and in accordance with at least one embodiment. The second phaseillustrated in FIG. 6 is the commission phase, where managers 602 suchas the migration manager 104 and the system manager 108 described inconnection with FIG. 1 commission the virtual machine instance at thetarget location and copy memory and/or state from the virtual machineinstance at the source to the virtual machine instance at the target. Inthe commission phase, the original VM instance 606 is running at thesource location 604 with access to one or more services and resources608 as described herein. During the commission phase, the managers 602may perform operations so that the new VM instance 612 at the targetlocation 610 may acquire access to one or more of the services andresources 608 associated with the original VM instance 606 at the sourcelocation 604. During the commission phase, the managers 602 may alsocause memory and/or state to be copied from the original VM instance 606at the source location 604 to the new VM instance 612 at the targetlocation 610 and may also configure the original VM instance 606 at thesource location 604 to forward packets to the new VM instance 612 at thetarget location 610.

This forwarding 614 from the original VM instance 606 at the sourcelocation 604 to the new VM instance 612 at the target location 610 mayproceed throughout the process to commission the target. This forwarding614 from the from the original VM instance 606 at the source location604 to the new VM instance 612 at the target location 610 is so that thenew VM instance 612 may become congruent with (also referred to hereinas becoming “aligned” with or as “converging” with) the original VMinstance 606. In an embodiment, the convergence of the new VM instance612 with the original VM instance 606 is a condition for the completionof the commission phase of the migration (i.e., the commission phasedoes not complete until the virtual machines converge). Note that in theexample illustrated in FIG. 6, the original VM instance 606 at thesource location 604 and the new VM instance 612 at the target location610 are not locked, illustrating an example of optimistic locking or ofdelaying the lock until the flip phase.

FIG. 7 illustrates an example environment 700 where the third phase of avirtual machine instance migration is presented as described in FIG. 1and in accordance with at least one embodiment. The third phaseillustrated in FIG. 7 is the flip phase, where managers 702 such as themigration manager 104 and the system manager 108 described in connectionwith FIG. 1 complete the migration of the virtual machine instance atthe source to the virtual machine instance at the target. In the flipphase, the original VM instance 706 is running at the source location704 with access to one or more services and resources 708 as describedherein, but both the original VM instance 706 and the new VM instance712 may be locked so that any mutating changes to the original VMinstance 706 are blocked until the migration has completed.Additionally, both the original VM instance 706 and the new VM instance712 may be paused or locked, to further ensure that there are nomutating changes to either VM instance. In this phase, the copying andforwarding 714 of packets from the original VM instance 706 at thesource location 704 to the new VM instance 712 at the target location710 may continue as mutating changes prior to the lock continue toconverge.

FIG. 8 illustrates an example environment 800 where the fourth phase ofa virtual machine instance migration is presented as described in FIG. 1and in accordance with at least one embodiment. The fourth phaseillustrated in FIG. 8 is the cleanup phase, where managers 802 such asthe migration manager 104 and the system manager 108 described inconnection with FIG. 1 perform any final steps of the migration of thevirtual machine instance at the source to the virtual machine instanceat the target, depending on whether the flip was successful or afailure.

For a successful flip, the managers 802 may tear down the original VMinstance 806 at the source location 804, removing access to services andresources 808. The packet forwarding may continue, but may stop onconvergence of the target location 814. Meanwhile, the new VM instance812 at the target location 814 may replace the original VM instance 806at the source location 804 with access to the services and resources 808formerly associated with the original VM instance 806 at the sourcelocation 804. For an unsuccessful flip (e.g., due to a failure or acancellation), the managers 802 may rollback the migration by unlockingthe original VM instance 826 at the source location 824, may remove thenew VM instance at the target location 830, and may stop packetforwarding 832 from the original VM instance 826. One or more operationsassociated with the services and resources 828 may also be performedsuch as, for example, removing redundant connections and/or interfaces.

FIG. 9 illustrates an example diagram 900 showing the phases of avirtual machine instance migration as described in FIG. 1 and inaccordance with at least one embodiment. Managers 902, such as themigration manager 104 and the system manager 108 described in connectionwith FIG. 1, may generate a command to prepare a target 908 to receivethe migrated virtual machine instance, as described herein in connectionwith FIG. 5. The command may be sent to the target location 906 whereoperations to prepare the target VM 910 may be performed. If the commandto prepare the target VM 910 is successful, the managers 902 may thenstart the optimistic lock 912 of the source and target. The optimisticlock 912 of the source and target includes operations to monitor andsynchronize changes 914 so that mutating changes made to the source VMare propagated to the target VM as described herein. The operations tomonitor and synchronize changes 914 may continue until the flip begins,as described below.

After the optimistic lock 912, the managers 902 may then generatecommands to commission the target 916. The commands may be sent to thetarget location 906 as illustrated in FIG. 9. The commands may also besent to a source location 904 and/or to one or more services orresources as described herein. In response to the commands to commissionthe target 916, the target location may commission the target VM 918 asdescribed herein.

The command to prepare the target 908 and/or the commands to commissionthe target 918 may include a version number of the virtual machineinstance that will be migrated from the source location 904. Thisversion number of the of the virtual machine instance that will bemigrated from the source location 904 may be obtained by the managers902 by querying the source location. The managers 902 may query thesource location 904 and/or the target location 906 for version numbers.These version numbers may be used by the managers 902 to determinereadiness and/or convergence as described below.

The managers 902 may then determine whether the source is ready 922. Themanagers 902 may wait for the source location 904 to indicate that it isready for migration 924. The managers 902 may wait indefinitely, or maywait until a condition occurs, or may wait until a timeout expires, ormay wait until a number of iterations have occurred. If the sourcelocation 904 does not indicate that it is ready for migration 924, themanagers 902 may issue an error or alarm, initiate error handling, orbegin some other action in response. Although not illustrated in FIG. 9,if the source location 904 does not indicate that it is ready formigration 924, the migration may be cancelled. The managers 902 may thendetermine whether the target is ready for migration 926. Readiness ofthe target for migration may be predicated on the completion of thecommissioning of the target VM for migration when, for example, thestate of the target location 906 converges to the state of the sourcelocation 904. Again the managers 902 may wait for the target location906 to indicate that it is ready for migration 926 indefinitely, oruntil a condition occurs, or until a timeout expires and, as with thesource location, the managers 902 may perform one or more error handlingoperations. Additionally, the migration may be cancelled if the targetlocation does not indicate that it is ready for migration 926. Themanagers 902 may compare version numbers received as part of a readinessresponse from the source and/or from the target to some known ordetermined target version number to verify readiness.

Once both the source location 904 and the target location 906 are readyfor migration, the managers generate a command to do the pessimisticlock 928 on the source and the target virtual machine instances. Thepessimistic lock 928 is the final locking of the source 930 and thefinal locking of the target 932 to prevent any mutating changes duringthe critical flip phase of the migration. Once the source virtualmachine instance and the target virtual machine instance are locked, themanagers 902 may then initiate the flip 934, which may cause the sourcelocation 904 to complete the migration 936 of the virtual machineinstance to the target location 906, and may cause the target location906 to enable the virtual machine instance at the target location 906 bystarting 938 the virtual machine instance at the target location 906. Itshould be noted that the diagram illustrated in FIG. 9 does not includethe failure of the flip, which is described in more detail herein.

Finally, the managers may wait until all memory and/or states haveconverged 940 and the migration is completed and/or until versionnumbers have reached a determined state before tearing down the virtualmachine instance 942 at the source location 904 (including releasing thelock), unlocking 944 the target location 906, and completing anyremaining cleanup 946 of the migration.

FIG. 10 illustrates an example state diagram 1000 showing the statechanges of a virtual machine instance migration as described in FIG. 1and in accordance with at least one embodiment. At the beginning of thevirtual machine migration, a virtual machine instance may be running atthe source location 1002 as described herein. When the migration entersits first phase, prepare target 1004, the system enters a next state1006 with the virtual machine instance still running at the sourcelocation 1008 while a virtual machine slot is prepared at the targetlocation 1010. When the migration enters its second phase, commissiontarget 1012, the system enters a next state 1014 with the virtualmachine instance still running at the source location 1016, while avirtual machine instance is commissioned at the target location 1018. Inthe commission phase, both the source and the target may be locked orone or both may have their locking delayed until later in the migrationby using an optimistic locking technique. Note that in the diagramillustrated in FIG. 10, the lock has been delayed to as late as possibleto reduce the potential impact of the migration. In this example, thelast operation of the state 1014 would be to lock the virtual machineinstance at the source location.

When the migration enters its third phase, flip 1020, the system entersa next state 1022 with the virtual machine instance locked at the sourcelocation 1024 while the virtual machine instance migration to the lockedtarget location is completed 1026. Both virtual machine instances may belocked in the state 1022 by, for example, locking a virtual machineabstraction associated with the source virtual machine instance and thetarget virtual machine instance. In an embodiment, the source virtualmachine instance and the target virtual machine instance are lockedseparately rather than by locking the virtual machine abstraction.

If the flip fails 1028, the system will next enter a failure state 1032with the virtual machine instance locked at the source location 1034while the locked virtual machine instance migration to the targetlocation is terminated 1036. The system will next enter a final cleanupand unlock phase 1038, resulting in a virtual machine instance runningon the source 1040, leaving the system just as it was before themigration was attempted. The failed migration may be attempted later.

If the flip succeeds 1030, the system will next enter a success state1042 with the virtual machine instance locked at the source location1044 while the locked virtual machine instance is ready at the targetlocation 1046. The system will next enter a final cleanup and unlockphase 1048, resulting in a virtual machine instance running on thetarget 1050, and a successful migration.

FIG. 11 illustrates an example environment 1100 where requests that maymodify a migrating virtual machine instance are classified and processedas described in FIG. 1 and in accordance with at least one embodiment.Classifying and processing requests that may modify a migrating virtualmachine instance during the migration may allow for optimistic locking,where the lock of the migrating virtual machine may be delayed as longas possible, thereby reducing the impact on a user as a result of themigration.

Requests 1102 may be received by a system manager 1104. The requests1102 may include API requests, webservice requests, library requests, orsome other type of request. The requests 1102 may be associated with amigration and may be received from a migration manager as describedherein. The requests 1102 may also be independent of the migration andmay, for example, be requests received by a virtual machine instance asa result of the operation of and/or interaction with the virtual machineinstance. For example, a request from a user to establish a connectionto a new block storage device provided by a block storage service may beindependent of the migration. Requests which are independent of themigration may also be generated from within the virtual machine instanceas described herein (e.g., a virtual machine instance may be running anoperating system that may allow a user to directly mount a block storagedevice by logging into the virtual machine instance and directly issuingcommands to establish a connection to a block storage device).

The requests 1102 may be sent to a virtual machine instance that may bein the process of being migrated as described herein. The requests 1102may be sent to the virtual machine instance from the migration managerdescribed herein. The requests 1102 may also be sent to the virtualmachine instance from services and/or resources associated with thevirtual machine instance. The requests 1102 may also be sent to thedomain (i.e., the actual virtual machine instance) or to the hostmachine where that domain resides. The requests 1102 may also be in theform of responses to requests generated by the virtual machine instance(e.g., the virtual machine instance may have requested access to aresource and the call may be generated based on that request).

The requests 1102 may be classified 1106 by the system manager 1104 asto whether they are non-mutating 1108, VM abstraction mutating 1114, VMinstance mutating 1132, or unblockable 1138. Requests 1102 may beclassified 1106 by the system manager 1104 according to a categorizationof a request type associated with the request. For example, the systemmanager 1104 may categorize requests by request types such as “get”requests (e.g., requests that retrieve data from resources), “put”requests (e.g., requests that send data to resources), and “describe”requests (e.g., requests that describe resources). Each request may beconsidered an instance of a request type according to the categorizationand classified according to that request type. For example, requeststhat are categorized as the “put” request type may be mutating requests,requests that are categorized as the “describe” request type may benon-mutating requests, and requests that are categorized as the “get”request type may be non-mutating. When the request is an applicationprogramming interface request, the application programming interfacerequest may be classified by an application programming interfacerequest type such as, for example, get or put requests. Each applicationprogramming interface request may also be considered an instance of anapplication programming interface request type.

As described above, requests that are non-mutating 1108 are requeststhat do not cause any changes to the virtual machine instance or theuser visible abstraction of that virtual machine instance. Requeststhat, for example, describe resources or provide other such informationare non-mutating 1108. Requests that are non-mutating 1108 are alwaysallowed 1110 and sent to the source VM (i.e., the virtual machineinstance at the source location) for processing. A request received froma user or customer while that user or customer is interacting with avirtual machine may be referred to herein as a “customer-initiatedrequest.” An application programming interface request (or API request)received from the user or customer while that user or customer isinteracting with a virtual machine may be referred to herein as a“customer-initiated application programming interface request” or as a“customer-initiated API request.”

Requests that are VM abstraction mutating 1114 are requests that causechanges to the user visible abstraction of the virtual machine instance.A user visible abstraction of a virtual machine instance should remaininvariant during migration. Before the migration, the user visibleabstraction of the virtual machine instance is backed by the virtualmachine instance at the source location. During the migration, the uservisible abstraction of the virtual machine instance is also backed bythe virtual machine instance at the source location although during theflip, the virtual machine instance at the source location (and thus theuser visible abstraction of the virtual machine instance) may be locked.After a successful migration, the user visible abstraction of thevirtual machine instance is backed by the virtual machine instance atthe target location. After a failed or cancelled migration, the uservisible abstraction of the virtual machine instance is backed by thevirtual machine instance at the source location.

Requests that are VM abstraction mutating 1114 are requests that changethe visible state of the virtual machine instance by, for example,pausing the virtual machine instance, stopping the virtual machineinstance, or starting the virtual machine instance. Requests that changethe state of a network interface or a storage volume are also VMabstraction mutating 1114. Requests that are VM abstraction mutating1114 will cause the version number of the virtual machine instance tochange. Requests that are VM abstraction mutating 1114 will generallycause corresponding changes to the virtual machine instance that isbacking the VM abstraction. For example, a call that changes the stateof a network interface in the user visible VM abstraction may also causea corresponding change to be made to the virtual machine instance at thesource location. Requests that are VM abstraction mutating 1114 may beallowed if, for example, the underlying instances are not locked duringthe flip.

When requests that are VM abstraction mutating 1114 are allowed 1116, anattempt may be made to synchronize 1120 the changes made by the call toboth the source VM 1124 (i.e., the virtual machine instance at thesource location) and the target VM 1126 (i.e., the virtual machineinstance at the target location). For example, the packet forwardingdescribed herein may be used to synchronize 1120 the source VM 1124 andthe target VM 1126. Version numbers may be used to aid in thissynchronization 1120. When requests that are VM abstraction mutating1114 are allowed 1116, they may also cause the migration to be cancelled1122. When the migration is cancelled 1122, the requests that are VMabstraction mutating 1114 and allowed 1116 may be sent to the source VM1128, but not sent to the target VM 1130.

Requests that are VM abstraction mutating 1114 will always be blocked ifthe virtual machine instance is locked during the flip as describedherein. Requests that are VM abstraction mutating 1114 and that areblocked may be rejected (e.g., have a rejection response sent), or theymay be added to a request queue that contains an ordered list of pendingrequests for processing after the virtual machine instance lock isreleased. Although not illustrated in FIG. 11, requests that are VMabstraction mutating 1114 and that are blocked may also cause themigration to be cancelled as described herein if, for example, allowingsuch requests might cause the migration to become excessivelycomplicated or might cause the migration to take too long.

Requests that are VM instance mutating 1132 are requests that causechanges to the source domain (i.e., the virtual machine instance at thesource), but not to the user visible abstraction of the virtual machineinstance. Such requests do not cause the virtual machine version numberto change, but may cause a domain version number to change. Suchrequests may be generated with an expected or target virtual machineversion number so that they can be allowed or rejected based on whetheror not the domain changes are being made to the same virtual machineinstance version as was intended. The inclusion of the virtual machinetarget version number in a call that is VM instance mutating may ensurethat an alteration may not be made to a virtual machine instance wherethe user visible abstraction of the VM instance has changed. Forexample, a call that is VM instance mutating 1132 may be generated tomake a change to a file backed by a block storage device provided by ablock storage service. If the call specifies virtual machine targetversion number one, but when it is received, the virtual machine versionnumber is two, the change in virtual machine version number may be aresult of a VM abstraction mutating call that altered the availabilityof that block storage device. In an embodiment where the virtual machinetarget version number of the VM instance mutating requests is optional,VM instance mutating requests can be allowed while the virtual machineis not locked during migration and rejected when the virtual machine islocked during migration. A majority of the requests made by themigration manager are VM instance mutating 1132, rather than VMabstraction mutating.

As described above, requests that are VM instance mutating 1132 may beallowed 1134 or may be blocked 1136. Requests that are VM instancemutating 1132 and that are blocked 1136 may be rejected (e.g., have arejection response sent), or they may be queued for processing after thevirtual machine instance lock is released. Requests that are VM instancemutating 1132 and that are blocked 1136 may also cause the migration tobe cancelled as described herein.

As with requests that are VM abstraction mutating 1114 and allowed 1116,when requests that are VM instance mutating 1132 are allowed 1134, anattempt may be made to synchronize 1120 the changes made by the call toboth the source VM 1124 (i.e., the virtual machine instance at thesource location) and the target VM 1126 (i.e., the virtual machineinstance at the target location). When requests that are VM instancemutating 1132 are allowed 1134, they may also cause the migration to becancelled 1122. When the migration is cancelled 1122, the requests thatare VM instance mutating 1132 and allowed 1134 may be sent to the sourceVM 1128, but not sent to the target VM 1130.

Requests that are unblockable 1138 are mutating requests that may not besafely blocked because, for example, the system is configured to notallow blocking of such requests.

Unblockable requests may cause a change in the virtual machine versionnumber and may require special cleanup procedures by the migrationmanager. Requests that are unblockable 1138 may be allowed 1140, but maycause the migration to be cancelled 1142. The requests that areunblockable 1138 may then be sent to the source VM 1144, but not sent tothe target VM 1146. Although not illustrated in FIG. 11, requests thatare unblockable 1138 may also be allowed 1140, but may not cause themigration to be cancelled. For example, a call to halt a virtual machineinstance that is issued from within the virtual machine instance (e.g.,a Unix ‘shutdown -h now’ command) may be logged and, after the migrationhas completed, may be executed on the target domain (resulting in asuccessful migration and a shutdown).

FIG. 12 illustrates an example environment 1200 where resourcesassociated with a virtual machine instance migration are managed asdescribed in FIG. 1 and in accordance with at least one embodiment. Theexample environment 1200 represents the first part of a migration, suchas the migration described herein. A user may have access to a virtualmachine abstraction 1202 backed by an original VM instance 1206 at asource location 1204. The original VM instance 1206 may include anetwork interface 1208 and one or more storage locations 1210. Duringmigration, the user may have the same access to a virtual machineabstraction 1212 backed by the original VM instance 1216 at a sourcelocation 1214. The original VM instance 1216 may still include a networkinterface 1218 and one or more storage locations 1220, but the networkinterface 1218 may be shared with a new VM instance 1228 at a targetlocation 1226 and/or may be duplicated as the network interface 1224.

The network interface 1218 and the network interface 1224 may be thesame network interface from the perspective of the virtual machineabstraction and/or the user, and the migration manager may manage whichis the active interface and which is the standby interface during thecourse of the migration. For example, prior to the flip, the networkinterface 1218 may be the active interface and the network interface1224 may be the standby interface. After the flip, the network interface1218 may be the standby interface and the network interface 1224 may bethe active interface. Additionally, the one or more storage locations1220 may be shared between the original VM instance 1216 and the new VMinstance 1228. During migration, memory and/or state information may becopied and forwarded 1222 from the original VM instance 1216 to the newVM instance 1228 as described herein.

FIG. 13 illustrates an example environment 1300 where resourcesassociated with a virtual machine instance migration are managed asdescribed in FIG. 1 and in accordance with at least one embodiment. Theexample environment 1300 represents the second part of a migration suchas the migrations described herein. A user may have access to a virtualmachine abstraction 1302, but because the migration is reachingcompletion, the virtual machine abstraction 1302 may be backed by a newVM instance 1320 at a target location 1318. The new VM instance 1320 mayhave a network interface 1322 (which may be the same as the networkinterface 1308 as described above in connection with FIG. 12) and mayhave access 1324 to one or more storage locations 1312. The networkinterface 1308 may be the active network interface and the networkinterface 1322 may be the standby network interface. Meanwhile, theoriginal VM instance 1306 at the source location 1304 may be in theprocess of being torn down. For example, the connection 1310 to thenetwork interface 1308 may be terminated, the connection 1314 to the oneor more storage locations 1312 may be removed, and the packet forwarding1316 from the original VM instance to the new VM instance may be stoppedafter the original VM instance 1306 has converged.

After the successful migration, the user may have access to a virtualmachine abstraction 1326 backed by the new VM instance 1330 at thetarget location 1328. Except for the different location, this new VMinstance 1330 should appear to be the same as the original VM instance1206 described in connection with FIG. 12, with a new active networkinterface 1334 and access to one or more storage locations 1332.

FIG. 14 illustrates aspects of an example environment 1400 forimplementing aspects in accordance with various embodiments. As will beappreciated, although a web-based environment is used for purposes ofexplanation, different environments may be used, as appropriate, toimplement various embodiments. The environment includes an electronicclient device 1402, which can include any appropriate device operable tosend and/or receive requests, messages, or information over anappropriate network 1404 and, in some embodiments, convey informationback to a user of the device. Examples of such client devices includepersonal computers, cell phones, handheld messaging devices, laptopcomputers, tablet computers, set-top boxes, personal data assistants,embedded computer systems, electronic book readers, and the like. Thenetwork can include any appropriate network, including an intranet, theInternet, a cellular network, a local area network, a satellite networkor any other such network and/or combination thereof. Components usedfor such a system can depend at least in part upon the type of networkand/or environment selected. Protocols and components for communicatingvia such a network are well known and will not be discussed herein indetail. Communication over the network can be enabled by wired orwireless connections and combinations thereof. In this example, thenetwork includes the Internet, as the environment includes a web server1406 for receiving requests and serving content in response thereto,although for other networks an alternative device serving a similarpurpose could be used as would be apparent to one of ordinary skill inthe art.

The illustrative environment includes at least one application server1408 and a data store 1410. It should be understood that there can beseveral application servers, layers or other elements, processes orcomponents, which may be chained or otherwise configured, which caninteract to perform tasks such as obtaining data from an appropriatedata store. Servers, as used herein, may be implemented in various ways,such as hardware devices or virtual computer systems. In some contexts,servers may refer to a programming module being executed on a computersystem. As used herein, unless otherwise stated or clear from context,the term “data store” refers to any device or combination of devicescapable of storing, accessing and retrieving data, which may include anycombination and number of data servers, databases, data storage devicesand data storage media, in any standard, distributed, virtual orclustered environment. The application server can include anyappropriate hardware, software and firmware for integrating with thedata store as needed to execute aspects of one or more applications forthe client device, handling some or all of the data access and businesslogic for an application. The application server may provide accesscontrol services in cooperation with the data store and is able togenerate content including, but not limited to, text, graphics, audio,video and/or other content usable to be provided to the user, which maybe served to the user by the web server in the form of HyperText MarkupLanguage (“HTML”), Extensible Markup Language (“XML”), JavaScript,Cascading Style Sheets (“CSS”) or another appropriate client-sidestructured language. Content transferred to a client device may beprocessed by the client device to provide the content in one or moreforms including, but not limited to, forms that are perceptible to theuser audibly, visually and/or through other senses including touch,taste, and/or smell. The handling of all requests and responses, as wellas the delivery of content between the client device 1402 and theapplication server 1408, can be handled by the web server using PHP:Hypertext Preprocessor (“PHP”), Python, Ruby, Perl, Java, HTML, XML, oranother appropriate server-side structured language in this example. Itshould be understood that the web and application servers are notrequired and are merely example components, as structured code discussedherein can be executed on any appropriate device or host machine asdiscussed elsewhere herein. Further, operations described herein asbeing performed by a single device may, unless otherwise clear fromcontext, be performed collectively by multiple devices, which may form adistributed and/or virtual system.

The data store 1410 can include several separate data tables, databases,data documents, dynamic data storage schemes and/or other data storagemechanisms and media for storing data relating to a particular aspect ofthe present disclosure. For example, the data store illustrated mayinclude mechanisms for storing production data 1412 and user information1416, which can be used to serve content for the production side. Thedata store also is shown to include a mechanism for storing log data1414, which can be used for reporting, analysis, or other such purposes.It should be understood that there can be many other aspects that mayneed to be stored in the data store, such as page image information andaccess rights information, which can be stored in any of the abovelisted mechanisms as appropriate or in additional mechanisms in the datastore 1410. The data store 1410 is operable, through logic associatedtherewith, to receive instructions from the application server 1408 andobtain, update or otherwise process data in response thereto. Theapplication server 1408 may provide static, dynamic, or a combination ofstatic and dynamic data in response to the received instructions.Dynamic data, such as data used in web logs (blogs), shoppingapplications, news services and other such applications may be generatedby server-side structured languages as described herein or may beprovided by a content management system (“CMS”) operating on, or underthe control of, the application server. In one example, a user, througha device operated by the user, might submit a search request for acertain type of item. In this case, the data store might access the userinformation to verify the identity of the user and can access thecatalog detail information to obtain information about items of thattype. The information then can be returned to the user, such as in aresults listing on a web page that the user is able to view via abrowser on the user device 1402. Information for a particular item ofinterest can be viewed in a dedicated page or window of the browser. Itshould be noted, however, that embodiments of the present disclosure arenot necessarily limited to the context of web pages, but may be moregenerally applicable to processing requests in general, where therequests are not necessarily requests for content.

Each server typically will include an operating system that providesexecutable program instructions for the general administration andoperation of that server and typically will include a computer-readablestorage medium (e.g., a hard disk, random access memory, read onlymemory, etc.) storing instructions that, when executed by a processor ofthe server, allow the server to perform its intended functions. Suitableimplementations for the operating system and general functionality ofthe servers are known or commercially available and are readilyimplemented by persons having ordinary skill in the art, particularly inlight of the disclosure herein.

The environment, in one embodiment, is a distributed and/or virtualcomputing environment utilizing several computer systems and componentsthat are interconnected via communication links, using one or morecomputer networks or direct connections. However, it will be appreciatedby those of ordinary skill in the art that such a system could operateequally well in a system having fewer or a greater number of componentsthan are illustrated in FIG. 14. Thus, the depiction of the systemillustrated in example environment 1400 in FIG. 14 should be taken asbeing illustrative in nature and not limiting to the scope of thedisclosure.

The various embodiments further can be implemented in a wide variety ofoperating environments, which in some cases can include one or more usercomputers, computing devices or processing devices which can be used tooperate any of a number of applications. User or client devices caninclude any of a number of general purpose personal computers, such asdesktop, laptop or tablet computers running a standard operating system,as well as cellular, wireless and handheld devices running mobilesoftware and capable of supporting a number of networking and messagingprotocols. Such a system also can include a number of workstationsrunning any of a variety of commercially-available operating systems andother known applications for purposes such as development and databasemanagement. These devices also can include other electronic devices,such as dummy terminals, thin-clients, gaming systems and other devicescapable of communicating via a network. These devices also can includevirtual devices such as virtual machines, hypervisors and other virtualdevices capable of communicating via a network.

Various embodiments of the present disclosure utilize at least onenetwork that would be familiar to those skilled in the art forsupporting communications using any of a variety ofcommercially-available protocols, such as Transmission ControlProtocol/Internet Protocol (“TCP/IP”), User Datagram Protocol (“UDP”),protocols operating in various layers of the Open System Interconnection(“OSI”) model, File Transfer Protocol (“FTP”), Universal Plug and Play(“UpnP”), Network File System (“NFS”), Common Internet File System(“CIFS”), and AppleTalk. The network can be, for example, a local areanetwork, a wide-area network, a virtual private network, the Internet,an intranet, an extranet, a public switched telephone network, aninfrared network, a wireless network, a satellite network, and anycombination thereof.

In embodiments utilizing a web server, the web server can run any of avariety of server or mid-tier applications, including Hypertext TransferProtocol (“HTTP”) servers, FTP servers, Common Gateway Interface (“CGP”)servers, data servers, Java servers, Apache servers, and businessapplication servers. The server(s) also may be capable of executingprograms or scripts in response to requests from user devices, such asby executing one or more web applications that may be implemented as oneor more scripts or programs written in any programming language, such asJava®, C, C#, or C++, or any scripting language, such as Ruby, PHP,Perl, Python, or TCL, as well as combinations thereof. The server(s) mayalso include database servers, including without limitation thosecommercially available from Oracle®, Microsoft®, Sybase®, and IBM® aswell as open-source servers such as MySQL, Postgres, SQLite, MongoDB,and any other server capable of storing, retrieving, and accessingstructured or unstructured data. Database servers may includetable-based servers, document-based servers, unstructured servers,relational servers, non-relational servers or combinations of theseand/or other database servers.

The environment can include a variety of data stores and other memoryand storage media as discussed above. These can reside in a variety oflocations, such as on a storage medium local to (and/or resident in) oneor more of the computers or remote from any or all of the computersacross the network. In a particular set of embodiments, the informationmay reside in a storage-area network (“SAN”) familiar to those skilledin the art. Similarly, any necessary files for performing the functionsattributed to the computers, servers or other network devices may bestored locally and/or remotely, as appropriate. Where a system includescomputerized devices, each such device can include hardware elementsthat may be electrically coupled via a bus, the elements including, forexample, at least one central processing unit (“CPU” or “processor”), atleast one input device (e.g., a mouse, keyboard, controller, touchscreen or keypad) and at least one output device (e.g., a displaydevice, printer or speaker). Such a system may also include one or morestorage devices, such as disk drives, optical storage devices andsolid-state storage devices such as random access memory (“RAM”) orread-only memory (“ROM”), as well as removable media devices, memorycards, flash cards, etc.

Such devices also can include a computer-readable storage media reader,a communications device (e.g., a modem, a network card (wireless orwired), an infrared communication device, etc.), and working memory asdescribed above. The computer-readable storage media reader can beconnected with, or configured to receive, a computer-readable storagemedium, representing remote, local, fixed, and/or removable storagedevices as well as storage media for temporarily and/or more permanentlycontaining, storing, transmitting, and retrieving computer-readableinformation. The system and various devices also typically will includea number of software applications, modules, services or other elementslocated within at least one working memory device, including anoperating system and application programs, such as a client applicationor web browser. It should be appreciated that alternate embodiments mayhave numerous variations from that described above. For example,customized hardware might also be used and/or particular elements mightbe implemented in hardware, software (including portable software, suchas applets) or both. Further, connection to other computing devices suchas network input/output devices may be employed.

Storage media and computer readable media for containing code, orportions of code, can include any appropriate media known or used in theart, including storage media and communication media, such as, but notlimited to, volatile and non-volatile, removable and non-removable mediaimplemented in any method or technology for storage and/or transmissionof information such as computer readable instructions, data structures,program modules or other data, including RAM, ROM, Electrically ErasableProgrammable Read-Only Memory (“EEPROM”), flash memory or other memorytechnology, Compact Disc Read-Only Memory (“CD-ROM”), digital versatiledisk (DVD) or other optical storage, magnetic cassettes, magnetic tape,magnetic disk storage or other magnetic storage devices or any othermedium which can be used to store the desired information and which canbe accessed by the system device. Based on the disclosure and teachingsprovided herein, a person of ordinary skill in the art will appreciateother ways and/or methods to implement the various embodiments.

The specification and drawings are, accordingly, to be regarded in anillustrative rather than a restrictive sense. It will, however, beevident that various modifications and changes may be made thereuntowithout departing from the broader spirit and scope of the invention asset forth in the claims.

Other variations are within the spirit of the present disclosure. Thus,while the disclosed techniques are susceptible to various modificationsand alternative constructions, certain illustrated embodiments thereofare shown in the drawings and have been described above in detail. Itshould be understood, however, that there is no intention to limit theinvention to the specific form or forms disclosed, but on the contrary,the intention is to cover all modifications, alternative constructionsand equivalents falling within the spirit and scope of the invention, asdefined in the appended claims.

The use of the terms “a” and “an” and “the” and similar referents in thecontext of describing the disclosed embodiments (especially in thecontext of the following claims) are to be construed to cover both thesingular and the plural, unless otherwise indicated herein or clearlycontradicted by context. The terms “comprising,” “having,” “including,”and “containing” are to be construed as open-ended terms (i.e., meaning“including, but not limited to,”) unless otherwise noted. The term“connected,” when unmodified and referring to physical connections, isto be construed as partly or wholly contained within, attached to orjoined together, even if there is something intervening. Recitation ofranges of values herein are merely intended to serve as a shorthandmethod of referring individually to each separate value falling withinthe range, unless otherwise indicated herein and each separate value isincorporated into the specification as if it were individually recitedherein. The use of the term “set” (e.g., “a set of items”) or “subset”unless otherwise noted or contradicted by context, is to be construed asa nonempty collection comprising one or more members. Further, unlessotherwise noted or contradicted by context, the term “subset” of acorresponding set does not necessarily denote a proper subset of thecorresponding set, but the subset and the corresponding set may beequal.

Conjunctive language, such as phrases of the form “at least one of A, B,and C,” or “at least one of A, B and C,” unless specifically statedotherwise or otherwise clearly contradicted by context, is otherwiseunderstood with the context as used in general to present that an item,term, etc., may be either A or B or C, or any nonempty subset of the setof A and B and C. For instance, in the illustrative example of a sethaving three members, the conjunctive phrases “at least one of A, B, andC” and “at least one of A, B and C” refer to any of the following sets:{A}, {B}, {C}, {A, B}, {A, C}, {B, C}, {A, B, C}. Thus, such conjunctivelanguage is not generally intended to imply that certain embodimentsrequire at least one of A, at least one of B and at least one of C eachto be present.

Operations of processes described herein can be performed in anysuitable order unless otherwise indicated herein or otherwise clearlycontradicted by context. Processes described herein (or variationsand/or combinations thereof) may be performed under the control of oneor more computer systems configured with executable instructions and maybe implemented as code (e.g., executable instructions, one or morecomputer programs or one or more applications) executing collectively onone or more processors, by hardware or combinations thereof. The codemay be stored on a computer-readable storage medium, for example, in theform of a computer program comprising a plurality of instructionsexecutable by one or more processors. The computer-readable storagemedium may be non-transitory.

The use of any and all examples, or exemplary language (e.g., “such as”)provided herein, is intended merely to better illuminate embodiments ofthe invention and does not pose a limitation on the scope of theinvention unless otherwise claimed. No language in the specificationshould be construed as indicating any non-claimed element as essentialto the practice of the invention.

Embodiments of this disclosure are described herein, including the bestmode known to the inventors for carrying out the invention. Variationsof those embodiments may become apparent to those of ordinary skill inthe art upon reading the foregoing description. The inventors expectskilled artisans to employ such variations as appropriate and theinventors intend for embodiments of the present disclosure to bepracticed otherwise than as specifically described herein. Accordingly,the scope of the present disclosure includes all modifications andequivalents of the subject matter recited in the claims appended heretoas permitted by applicable law. Moreover, any combination of theabove-described elements in all possible variations thereof isencompassed by the scope of the present disclosure unless otherwiseindicated herein or otherwise clearly contradicted by context.

All references, including publications, patent applications, andpatents, cited herein are hereby incorporated by reference to the sameextent as if each reference were individually and specifically indicatedto be incorporated by reference and were set forth in its entiretyherein.

What is claimed is:
 1. A computer-implemented method, comprising:determining that a virtual machine running on a source host is to bemigrated away from the source host; and migrating the virtual machineaway from the source host at least by: selecting a target host for thevirtual machine; copying, while the virtual machine continues to run onthe source host, a state of the virtual machine from the source host tothe target host; propagating, to the target host, a change to the stateof the virtual machine, the change resulting from the virtual machinerunning on the source host during the copying; and running the virtualmachine on the target host such that the virtual machine running on thetarget host includes the change to the state.
 2. Thecomputer-implemented method of claim 1, wherein the change to the stateof the virtual machine occurs as a result of fulfilment of a requestreceived at the virtual machine running on the source host during thecopying.
 3. The computer-implemented method of claim 1, wherein: thecomputer-implemented method further comprises pausing the virtualmachine running on the source host; and propagating the change to thetarget host further comprises propagating the change to the target hostwhile the virtual machine running on the source host is paused.
 4. Thecomputer-implemented method of claim 1, wherein determining that thevirtual machine is to be migrated away from the source host is based atleast in part on a change in availability of a resource associated withthe source host.
 5. The computer-implemented method of claim 4, wherethe change in availability of the resource is due to an insufficiency ofat least one of: computing power, memory, or network bandwidth.
 6. Thecomputer-implemented method of claim 1, wherein selecting the targethost is based at least in part on a capability of the target host. 7.The computer-implemented method of claim 6, wherein migrating thevirtual machine further comprises determining that the capability of thetarget host is sufficient to fulfill a desired capability specified forthe virtual machine.
 8. The computer-implemented method of claim 7,wherein the desired capability is one or more of: a type of processor torun the virtual machine, a quantity of or processors to run the virtualmachine, or an amount of memory to allocate to the virtual machine.
 9. Asystem, comprising: one or more processors; and memory includingexecutable instructions that, if executed by the one or more processors,cause the system to: determine to perform a migration of a virtualmachine away from a source host; identify a target host for themigration; and perform the migration by causing the system to at least:while a first instance of the virtual machine continues to execute onthe source host, copy a state of the first instance of the virtualmachine from the source host to the target host; propagate, to thetarget host, a change to the state of the virtual machine, the changeresulting during execution of the first instance during copying thestate; and execute a second instance of the virtual machine on thetarget host such that the second instance includes the change to thestate.
 10. The system of claim 9, wherein the executable instructionsfurther cause the system to: terminate the first instance; and reclaim aresource associated with the first instance.
 11. The system of claim 9,wherein the executable instructions further cause the system to identifythe target host based at least in part on resource availability,location, or operation cost.
 12. The system of claim 9, wherein theexecutable instructions further cause the system to begin execution ofthe second instance during copying the state.
 13. The system of claim 9,wherein the executable instructions further cause the system to: obtain,during copying the state, a request directed to the first instance;determine, based at least in part on a type of the request, to performan operation that results in the change to the state of the firstinstance; and perform the operation.
 14. The system of claim 13, whereinthe executable instructions further cause the system to at least: lockthe first instance of the virtual machine; compare the state of thefirst instance of the virtual machine to a second state of the secondinstance of the virtual machine; determine that a difference between thestate and the second state exceeds a threshold; and in response todetermining that the difference exceeds the threshold, cancel migrationof the virtual machine.
 15. The system of claim 9, wherein theexecutable instructions further cause the system to: lock the firstinstance of the virtual machine; determine that the state of the firstinstance including the change has been copied to the second instance ofthe virtual machine; and forward a request directed to the virtualmachine to the second instance.
 16. The system of claim 9, wherein theexecutable instructions further cause the system to: pause the firstinstance of the virtual machine; and propagate, to the target host whilethe first instance of the virtual machine is paused, the change to thestate of the virtual machine.
 17. One or more non-transitorycomputer-readable storage media storing executable instructions that, ifexecuted by one or more processors, cause the one or more processors toat least: receive a command to migrate a virtual machine running on asource host to a target host; copy, while the virtual machine continuesto run on the source host, a state of the virtual machine from thesource host to the target host; identify a change to the state of thevirtual machine resulting from the virtual machine running on the sourcehost during copying the state; and propagate the change to the targethost.
 18. The one or more non-transitory computer-readable storage mediaof claim 17, wherein the executable instructions further cause the oneor more processors to: determine that the virtual machine running on thesource host is to be migrated away to fulfill a request from a customerassociated with the virtual machine, the request indicating to reduce acost associated with the virtual machine running on the source host; andselect the target host as a result of a determination that a costassociated with the virtual machine running on the target host isreduced compared to the cost associated with the virtual machine runningon the source host.
 19. The one or more non-transitory computer-readablestorage media of claim 17, wherein the executable instructions furthercause the one or more processors to: detach a resource from the virtualmachine running on the source host; and attach the resource to thevirtual machine running on the target host.
 20. The one or morenon-transitory computer-readable storage media of claim 17, wherein theexecutable instructions further cause the one or more processors to:determine that the virtual machine is to be moved logically closer to acomputing resource; and select the target host at least in part as aresult of the target host being logically closer to the computingresource.
 21. The one or more non-transitory computer-readable storagemedia of claim 17, wherein the executable instructions further cause theone or more processors to: obtain, while the virtual machine continuesto run on the source host during copying the state, a request directedto the virtual machine running on the source host; determine, based atleast in part on the request, an operation to perform; and perform theoperation, resulting in the change to the state of the virtual machine.22. The one or more non-transitory computer-readable storage media ofclaim 17, wherein the executable instructions further cause the one ormore processors to: lock the virtual machine running on the source host;determine that the state of the virtual machine including the change hasbeen copied to the virtual machine running on the target host; andforward a request directed to the virtual machine to the target host.